During the last half decade, convolutional neural networks (CNNs) havetriumphed over semantic segmentation, which is a core task of various emergingindustrial applications such as autonomous driving and medical imaging.However, to train CNNs requires a huge amount of data, which is difficult tocollect and laborious to annotate. Recent advances in computer graphics make itpossible to train CNN models on photo-realistic synthetic data withcomputer-generated annotations. Despite this, the domain mismatch between thereal images and the synthetic data significantly decreases the models'performance. Hence we propose a curriculum-style learning approach to minimizethe domain gap in semantic segmentation. The curriculum domain adaptationsolves easy tasks first in order to infer some necessary properties about thetarget domain; in particular, the first task is to learn global labeldistributions over images and local distributions over landmark superpixels.These are easy to estimate because images of urban traffic scenes have strongidiosyncrasies (e.g., the size and spatial relations of buildings, streets,cars, etc.). We then train the segmentation network in such a way that thenetwork predictions in the target domain follow those inferred properties. Inexperiments, our method significantly outperforms the baselines as well as theonly known existing approach to the same problem.
展开▼